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Abstract — Interpolating scattered data points is a problem of 
wide ranging interest. Ordinary kriging is an optimal scat- 
tered data estimator, widely used in geosciences and remote 
sensing. A generalized version of this technique, called cok- 
riging, can be used for image fusion of remotely sensed data. 
However, it is computationally very expensive for large data 
sets. We demonstrate the time efficiency and accuracy of ap- 
proximating ordinary kriging through the use of fast matrix- 
vector products combined with iterative methods. We used 
methods based on the fast Multipole methods and nearest 
neighbor searching techniques for implementations of the fast 
matrix-vector products. 


Keywords— geostatistics, image fusion, kriging, approximate 
algorithms, fast multipole methods, fast Gauss transform, 
nearest neighbors, iterative methods. 
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1. Introduction 


images involved [6,7]. A generalized version of the ordinary 
kriging, called cokriging [3, 5], can be used for image fusion 
of multi-sensor data [8,9]. Unfortunately, ordinary kriging 
interpolant is computationally very expensive to compute ex- 
actly. For n scattered data points, computing the value of 
a single interpolant involves solving a dense linear system 
of order n x n, which takes 0(n 3 ). This is infeasible for 
large n. Traditionally, kriging is solved approximately by lo- 
cal approaches that are based on considering only a relatively 
small number of points that lie close to the query point [3,5]. 
There are many problems with this local approach, however. 
The first is that determining the proper neighborhood size is 
tricky, and is usually solved by ad hoc methods such as select- 
ing a fixed number of nearest neighbors or all the points lying 
within a fixed radius. Such fixed neighborhood sizes may not 
work well for all query points, depending on local density 
of the point distribution [5]. Local methods also suffer from 
the problem that the resulting interpolant is not continuous. 
Meyer showed that while kriging produces smooth continu- 
ous surfaces, it has zero order continuity along its borders 
[10]. Thus, at interface boundaries where the neighborhood 
changes, the interpolant behaves discontinuously. Therefore, 
it is important to consider and solve the global system for 
each interpolant. However, solving such large dense systems 
for each query point is impractical. 

Recently an approximation approach to kriging has been 
proposed based on a technique called covariance tapering 
[11,12]. However, this approach is not efficient when covari- 
ance models have relatively large ranges. Also, finding a valid 
taper, as defined in [1 1], for different covariance functions is 
difficult and at times impossible (e.g. Gaussian covariance 
model). 


Scattered data interpolation is a problem of interest in numer- 
ous areas such as electronic imaging, smooth surface model- 
ing, and computational geometry [1,2]. Our motivation arises 
from applications in geology and mining, which often involve 
large scattered data sets and a demand for high accuracy. For 
such cases, the method of choice is ordinary kriging [3]. This 
is because it is a best unbiased estimator [3—5]. Also in re- 
mote sensing, image fusion of multi-sensor data is often used 
to increase either the spectral or the spatial resolution of the 
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In this paper, we address the shortcomings of the previous 
approaches through an alternative based on Fast Multipole 
Methods (FMM). The FMM was first introduced by Greengard 
and Rokhlin for fast multiplication of a structured matrix by 
a vector [13, 14]. If the Gaussian function is used for gen- 
erating the matrix entries, the matrix-vector product is called 
the Gauss transform. We use efficient implementations of the 
Gauss transform based on the FMM idea (see [15, 16]) in com- 
bination with the symmlq iterative method [17] for solving 
large ordinary kriging systems. Billings et al. [18] had also 
suggested the use of iterative methods in combination with 
the FMM for solving such systems. 

The remainder of this paper is organized as follows. Section 


1 


‘ ft 

' 2 describes the ordinary kriging system and solving it via an 

iterative method. In Section 3 we introduce the matrix- vector 
products involved in solving such systems. We mention two 
existing efficient and approximate implementations of such 
products in Section 4. Section 5 describes our data sets. Our 
experiments and results are presented in Sections 6 and 7 re- 
spectively. Section 8 concludes this paper. 


2. Ordinary Kriging via Iterative 
Methods 


Kriging is an interpolation method named after Danie Krige, 
a South African mining engineer [5]. Kriging and its vari- 
ants have been traditionally used in mining and geostatistics 
applications [3-5]. Kriging is also referred to as the Gaus- 
sian process predictor in the machine learning domain [19]. 
The most commonly used variant is called ordinary kriging , 
which is often referred to as a Best Linear Unbiased Esti- 
mator (BLUE) [3, 1 1]. It is considered to be best because it 
minimizes the variance of the estimation error. It is linear 
because estimates are weighted linear combination of avail- 
able data, and is unbiased since it aims to have the mean error 
equal to zero [3]. Minimizing the variance of the estimation 
error forms the objective function of an optimization prob- 
lem. Ensuring unbiasedness of the error imposes a constraint 
on this function. Formalizing this objective function with its 
constraint results in the following system [3,5, 12]. 


(CL 

V L T 0 


w 



( 1 ) 


where C is the matrix of points’ pairwise covariances, L is a 
column vector of all ones and of size n, and w is the vector 
of weights Therefore, the minimization prob- 

lem for n points reduces to solving a linear system of size 
(n + l) 2 , which is impractical for very large data sets via di- 
rect approaches. It is also important that matrix C be positive 
definite [3, 12]. Pairwise covariances are often modeled as 
a function of points’ separation. These functions should re- 
sult in a positive definite covariance matrix. Christakos [20] 
showed necessary and sufficient conditions for such permis- 
sible covariance functions. Two of these valid functions are 
the Gaussian and Spherical covariance functions [3,5,20]. In 
this paper, the Gaussian covariance function is used. For two 
points xi and xj, the Gaussian covariance function is defined 
as Cij = exp (— 3\\xi — Xj\\ 2 /a 2 ), where a is the range of 
the covariance function. 


For large data sets, it is impractical to solve the ordinary krig- 
ing system using direct approaches that take 0(n 3 ) time. It- 
erative methods generate a sequence of solutions which con- 
verge to the true solution in n iterations. In practice, however, 
we loop over k n iterations [21]. In particular, we used 
an iterative method called SYMMLQ which is appropriate for 
solving symmetric systems [17]. Note that the coefficient ma- 
trix in the ordinary kriging linear system while symmetric is 
not positive definite since it has a zero entry on its diagonal. 
Therefore, methods such as conjugate gradient are not appli- 
cable here [22]. The actual implementation of the SYMMLQ 


method requires one 0(n 2 ) matrix-vector multiplication per 
iteration. The storage is 0(n) since the matrix-vector multi- 
plication can use elements computed on the fly without stor- 
ing the matrix. Empirically the number of iterations required, 
fc, is generally small compared to n leading to a computa- 
tional cost of 0(fcn 2 ). 

3. Matrix- vector Multiplication 

The 0(kn 2 ) quadratic complexity is still too high for 
large data sets. The core computational step in each 
SYMMLQ iteration involves the multiplication of a matrix 
C with some vector, say q. For the Gaussian covari- 
ance model the entries of the matrix C are of the form 
[q y = exp (—3\\xi — Xj\\ 2 /a 2 ). Hence, the j th ele- 
ment of the matrix- vector product Cq can be written as 
(Cq)j = Ya=i Qi ex P ( — 3||xi — xj || 2 /a 2 ) -which is the 
weighted sum of n Gaussians each centered at Xi and eval- 
uated at Xj . 

Discrete Gauss transform (GT) 

The sum of multivariate Gaussians is known as the discrete 
Gauss transform in scientific computing. In general, for each 
target point {yj E (which in our case are the same as 

the source points xi) the discrete Gauss transform is defined 
as 

JL \\ x i—yj II 2 

<?(%) = 5>e (2) 

i—1 

where h (in our case h = a/yj 3) is called the bandwidth of 
the Gaussian. Evaluating discrete Gauss transforms for m 
target points due to n different source locations arises in may 
applications. In this paper, the Gauss transform (GT) refers to 
this direct implementation, which takes 0(mn) time. 


4. Fast Approximate Matrix-vector 
Multiplication 


Various fast approximation algorithms [15,23] have been pro- 
posed to compute the discrete Gauss transform in 0(m + n) 
time. These algorithms compute the sum to any arbitrary e 
precision. For any e > 0, we define G to be an e-exact ap- 
proximation to G if the maximum absolute error relative to 
the total weight Q = k*l * s upper bounded by e, i. e . , 


max 

yj 


\G( yj )-G( yj )\ 

Q 


< e. 


(3) 


The constant in 0(m + n) depends on the desired accuracy e , 
which however can be arbitrary. At machine precision there 
is no difference between the direct and the fast methods. The 
method relies on retaining only the first few terms of an infi- 
nite series expansion for the Gaussian function. These meth- 
ods are inspired by the fast multipole methods (FMM), origi- 
nally developed for the fast summation of the potential fields 
generated by a large number of sources, such as those arising 
in gravitational potential problems [13]. The fast Gauss trans- 
form (fgt) is a special case where FMM ideas were used for 
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the Gaussian potential [23]. The improved fast Gauss trans- 
form (IFGT) is a similar algorithm based on a single different 
factorization and data structure. It is suitable for higher di- 
mensional problems and provides comparable performance 
in lower dimensions [15]. 


£ 

H<p- 1 
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(5) 


Rearranging the terms Equation (5) can be written as 


Improved fast Gauss transform (IF GT) 

IFGT is an efficient algorithm for approximating the Gauss 
transform. The fast Gauss transform , first proposed by 
Greengard and Strain [23], is an £-exact approximation al- 
gorithm for the Gauss transform. This algorithm reduces the 
Gauss transform’s computational complexity from 0(mn ) to 
0(m + n). However, this algorithm’s constant factor grows 
exponentially with dimension d. Later improvements, includ- 
ing the IFGT algorithm, reduced this constant factor to asymp- 
totically polynomial order in terms of d. The IFGT algorithm 
was first introduced by Yang et al. [15]. Their implementa- 
tion did not use a sufficiently tight error bound to be useful 
in practice. Also, they did not adaptively select the necessary 
parameters to achieve the desired error bound. Raykar et al. 
later presented an approach that overcame these shortcom- 
ings [16,24]. We used the implementation due to Raykar et 
al We briefly describe some of the key ideas in IFGT. Please 
see [16,24] for more details. For any point x* E R d the Gauss 
Transform at yj can be written as. 


G(yj) = £ 
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where, 
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The coefficients C a can be evaluated separately in 0(n). 
Evaluation of G(yj) at m points is O(m). Hence the com- 
putational cost has reduced from the quadratic 0(nm) to the 
linear 0(n + m). We have omitted the constants in the com- 
putational cost. A detailed analysis of the cost can be seen in 
[16,24]. 
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In Equation (4) the first exponential inside the summation 
e -\\xi-x+\\ /h (j e p en( j s on jy on the source coordinates xi. 
The second exponential e _ ll^” x *ll / h depends only on the 
target coordinates yj. However for the third exponential, 
e 2 ( yj -x+)-(xi-x+)/h 2 ? t he source and target are entangled. The 
crux of the algorithm is to separate this entanglement via Tay- 
lor series using multi-index notation. The p-term truncated 
Taylor series expansion for e 2 (^ _x *)'( Xi “ x *)/ /l2 can be writ- 
ten as [16,24], 


o 2 (yj- x *)-( x i-x+)/h 2 

V — 
a \ 

M<p— i 
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The truncation number p is chosen based on the prescribed 
error e. Ignoring error terms for now G(yj) can be approxi- 
mated as, 


G(y j ) = ^ %e -H x *- I *ll 2 //‘ 2 e -l|yi-^.ll 2 /'> 2 
i= 1 


Thus far, we have used the Taylor series expansion about a 
certain point or*. However if we use the same x * for all 
the points we typically would require very high truncation 
number since the Taylor series is valid only in a small open 
ball around x*. The IFGT algorithm uses a data adaptive 
space partitioning scheme-the farthest point clustering algo- 
rithm [25]-to divide the n sources into K spherical clusters- 
and then build a Taylor series at the center of each cluster. 

The final algorithm has four stages. The first stage involves 
determining parameters of the algorithm based on specified 
error bounds, bandwidth, and data distribution. Second, the 
d-dimensional space is subdivided using a fc-center cluster- 
ing [25]. Next, a truncated representation of the Gaussian 
inside each cluster is built using a set of decaying basis func- 
tions. Finally, the influence of all the data in a neighborhood 
using coefficients at cluster centers are collected and the ap- 
proximate GT is evaluated. Please see [16,24] for details. 

GT with nearest neighbors (gtann) 

GTANN is also an efficient algorithm for calculating matrix- 
vector products. This method was implemented by 
Raykar [26], where it is referred to as the FigTree method. 
This method is most effective when the Gaussian models be- 
ing used have small ranges, while IFGT gives good speed-ups 
when dealing with large range values. 

First, based on the desired error bound, a search radius is cal- 
culated. Then, for each target point, source points within that 
radius are considered. Since the Gaussian function dissipates 
very rapidly, nearby points have the greatest influence. These 


3 


source points are calculated via fixed-radius nearest neighbor 
search routines of the ANN library [27]. Finally, for each tar- 
get point, their nearest neighbor source points are calculated 
in matrix- vector product calculations, involving the covari- 
ance matrix C in the ordinary kriging system. 

5. Data sets 

We generated three sets of sparse data sets. For the first set, 
the number of sampled points varied from 1000 up to 5000, 
while their covariance model had a small range value of 12. 
For the second set, we varied the number of samples in the 
same manner, except that the points’ covariance model had a 
larger range equal to 100. Finally, we sampled 5000 points 
from dense grids, where points’ covariance model had ranges 
equal to a — 12, 24, 100, 250, and 500. For each input data 
set we use 200 query points which are drawn from the same 
dense grid but are not present in the sampled data set. One 
hundred of these query points were sampled uniformly from 
the original grids. The other 100 query points were sampled 
from the same Gaussian distributions that were used in the 
generation of a small percentage of the sparse data. 

6. Experiments 

We used the SYMMLQ iterative method as our linear solver. 
We set the desired solutions’ relative error, or the convergence 
criteria for the SYMMLQ method, to €2 = 10” 3 . Thus, if 
SYMMLQ is implemented exactly, we expect the relative er- 
ror to be less than 10 -3 . The exact error is likely to be higher 
than that. We developed three versions of the ordinary krig- 
ing interpolator, each of which calculates the matrix-vector 
products involved in the SYMMLQ method differently. 

Gauss Transform (GT): Computes the matrix- vector prod- 

uct exactly. 

Improved Fast Gauss Transform (IFGT): Approximates 

the matrix-vector product to the e precision via the IFGT 
method mentioned in Section 4. 

Gauss transform with nearest neighbors ( GTANN); Ap- 

proximates the matrix-vector product using the GTANN 
method mentioned in Section 4. 

All experiments were run on a Sun Fire V20z running Red 
Hat Enterprise release 3, using the g++ compiler version 
3.2.3. Our software is implemented in C++ and uses the 
Geostatistical Template Library (GsTL) [28] and Approxi- 
mate Nearest Neighbor library (ANN) [27]. GsTL is used 
for building and solving the ordinary kriging systems, and 
ANN is used for finding nearest neighbors when using the 
GTANN approach. In all cases, for the IFGT and GTANN ap- 
proaches we required the approximate matrix- vector products 
to be evaluated within e = 10 -4 accuracy. All experiments’ 
results are averaged over five runs. We designed three experi- 
ments to study the effect of covariance ranges and number of 
data points on performances of our different ordinary kriging 
versions. 


Experiment 1: We varied number of scattered data points 

from 1000 up to 5000, with a small fixed Gaussian covariance 
model of range a = 12. 

Experiment 2: We varied number of sampled points. This 

time, points had a larger range equal to a = 100 for their 
covariance model. 

Experiment 3: Finally, we examined the effect of different 

covariance ranges equal to a = 12, 25, 100, 250, and 500 on 
a fixed data set of size 5000. 

7. Results 

In this section we compare both the quality of results and the 
speed-ups achieved for solving the ordinary kriging system 
via iterative methods combined with fast approximate matrix- 
vector multiplication techniques. 

Figure 1 presents results of the first experiment mentioned. 
When utilizing approximate methods IFGT and GTANN the 
exact residuals are comparable to those obtained from GT. 
The IFGT approach gave speed-ups ranging from 1. 3-7.6, 
while GTANN resulted in speed-ups ranging roughly from 50- 
150. This is mainly due to the fact that the covariance func- 
tion’s range is rather small. Since there are only a limited 
number of source points influencing each target point, collect- 
ing and calculating the influence of all source points for each 
target point (the IFGT approach) is excessive. The GTANN 
approach works well for such eases by considering only the 
nearest source points to each target point. In both cases, the 
speed-ups grow with number of data points. Algorithms per- 
form similarly for points from the Gaussian distribution to 
those from uniform distribution. 

Figure 2 shows results of the second experiment. While the 
GTANN approach did not result in significant speed-ups, the 
IFGT gave constant factor speed-ups ranging roughly from 1 .5 
to 7.5, as we increased the number of data points. The IFGT 
approach results in larger residuals for small data sets, and 
when solving the ordinary kriging system for query points 
from the uniform distribution. In particular, for n = 1000 
and n = 2000, the performance of IFGT is not acceptable with 
respect to the exact residuals calculated. This poor overall re- 
sult for these cases is because the SYMMLQ method did not 
meet its convergence criteria and reached maximum number 
of iterations. Increasing the required accuracy for the IFGT 
algorithm, by changing the e = 10 -4 to 10 -6 resolved this 
issue and gave residuals comparable to those obtained from 
the GT method. As the number of points increases, the qual- 
ity of results approaches those of the exact methods. For the 
query points from the Gaussian distribution, the quality of 
results are comparable to the exact method, when using the 
IFGT approach. The GTANN approach also results in compa- 
rable residuals to the exact methods in all cases. 

Figure 3 presents results of the last set of experiments. In all 
cases, the quality of results is comparable to those obtained 
from exact methods. The IFGT approach resulted in speed- 
ups of 7-15 in all cases. The GTANN approach is best when 
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used for covariance functions with small range values of 12, 
and 25. While the GTANN approach is slower than the exact 
methods for range values larger than 100, it results in speed- 
up factors of 151-153, and 47-49 for range values 12 and 25 
respectively. Thus, the GTANN approach is efficient for small 
range values, and so is the IFGT approach for large ranges. 

8. Conclusions 

We integrated efficient implementations of the Gauss Trans- 
form for solving ordinary kriging systems. We examined 
the effect of number of points and the covariance functions’ 
ranges on the running time for solving the system and the 
quality of results. The IFGT is more effective as number of 
data points increases. Our experiments using the IFGT ap- 
proach for solving the ordinary kriging system demonstrated 
speed-up factors ranging from 7-15 when using 5000 points. 
Based on our tests on varying number of data points, we 
expect even higher speed-up factors compared to the exact 
method when using larger data sets. In almost all cases, the 
quality of IFGT results are comparable to the exact meth- 
ods. The GTANN approach outperformed the IFGT method 
for small covariance range values, resulting in speed-up fac- 
tors as high as 153 and 49 respectively. The GTANN approach 
is slower than the exact methods for large ranges (100 and 
over in our experiments), and thus is not recommended for 
such cases. The quality of results for GTANN was compara- 
ble to the exact methods in all cases. Please see [26, 29] for 
details of methods used in this work. 

Future work involves efficient kriging via fast approximate 
matrix- vector products for other covariance functions, where 
a factorization exists. We also plan on using precondition- 
ers [21] for our iterative solver, so that the approximate ver- 
sions converge faster, and we obtain even higher speed-ups. 
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Figure 1. Experiment 1, Left: Average absolute errors. Right: Average CPU times 
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Figure 2. Experiment 2, Left: Average absolute errors. Right: Average CPU times 
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Figure 3. Experiment 3, Left: Average absolute errors. Right: Average CPU times 
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